63 research outputs found
A text-to-picture system for russian language
This paper presents motivation and design of the general purpose text-to-picture synthesis system. The described TTP system is designed for Russian language processing and operates with the natural language analysis subsystem, the stage processing subsystem, and the rendering subsystem. Every processing stage has been described and the basic design ideas of the system architecture have been highlighted. User study has been performed and further work reasons are explained
Best Prompts for Text-to-Image Models and How to Find Them
Recent progress in generative models, especially in text-guided diffusion
models, has enabled the production of aesthetically-pleasing imagery resembling
the works of professional human artists. However, one has to carefully compose
the textual description, called the prompt, and augment it with a set of
clarifying keywords. Since aesthetics are challenging to evaluate
computationally, human feedback is needed to determine the optimal prompt
formulation and keyword combination. In this paper, we present a
human-in-the-loop approach to learning the most useful combination of prompt
keywords using a genetic algorithm. We also show how such an approach can
improve the aesthetic appeal of images depicting the same descriptions.Comment: 13 pages (6 main pages), 7 figures, 4 tables, accepted at SIGIR '23
Short Paper Trac
Unsupervised Sense-Aware Hypernymy Extraction
In this paper, we show how unsupervised sense representations can be used to
improve hypernymy extraction. We present a method for extracting disambiguated
hypernymy relationships that propagates hypernyms to sets of synonyms
(synsets), constructs embeddings for these sets, and establishes sense-aware
relationships between matching synsets. Evaluation on two gold standard
datasets for English and Russian shows that the method successfully recognizes
hypernymy relationships that cannot be found with standard Hearst patterns and
Wiktionary datasets for the respective languages.Comment: In Proceedings of the 14th Conference on Natural Language Processing
(KONVENS 2018). Vienna, Austri
Eliminating fuzzy duplicates in crowdsourced lexical resources
Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach
Watset : automatic induction of synsets from a graph of synonyms
This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources
A spinning wheel for YARN : user interface for a crowdsourced thesaurus
YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results
Graph clustering for natural language processing
Graph-based representations are proven to be an effective approach for a variety of Natural Language Processing (NLP) tasks. Graph clustering makes it possible to extract useful knowledge by exploiting the implicit structure of the data. In this tutorial, we will present several efficient graph clustering algorithms, show their strengths and weaknesses as well as their implementations and applications. Then, the evaluation methodology in unsupervised NLP tasks will be discussed
Clustering Without Knowing How To: Application and Evaluation
Crowdsourcing allows running simple human intelligence tasks on a large crowd
of workers, enabling solving problems for which it is difficult to formulate an
algorithm or train a machine learning model in reasonable time. One of such
problems is data clustering by an under-specified criterion that is simple for
humans, but difficult for machines. In this demonstration paper, we build a
crowdsourced system for image clustering and release its code under a free
license at https://github.com/Toloka/crowdclustering. Our experiments on two
different image datasets, dresses from Zalando's FEIDEGGER and shoes from the
Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no
machine learning algorithms purely with crowdsourcing.Comment: accepted at ECIR 2023 Demonstration Trac
- …